UPSTREAM PR #20505: convert: Fix Qwen3.5/Qwen3.5 Moe NVFP4 Conversions by loci-dev · Pull Request #1267 · auroralabs-loci/llama.cpp

loci-dev · 2026-03-18T02:17:47Z

Note

Source pull request: ggml-org/llama.cpp#20505

This PR fixes several errors that occur when attempting to convert Qwen3.5/Qwen3.5Moe models. To keep this PR scope in check and specific, a separate PR ggml-org/llama.cpp#20506 allows loading of these newly converted models.

Bug:
When attempting to use convert_hf_to_gguf.py on various Qwen3.5 and Qwen3.5 MoE models, it would abort with the following error(s):

ValueError: Can not map tensor 'model.language_model.layers.0.mlp.shared_expert.down_proj.weight'
ValueError: Can not map tensor 'model.language_model.layers.0.linear_attn.in_proj_a.weight'

This occurred because these models now have model.language_model or language_model prefixes. The fix strips the wrappers instead of failing, which allows it to continue.
But just stripping the names and continuing was not enough to get the models converted properly, so it would cause a new error:

RuntimeError: shape '[16, 3, 1]' is invalid for input of size 1

This is because Qwen3.5's linear attention weights get reordered in modify_tensors():

# original order:  [q, k, v, z] * head_count
# corrected order: [q * head_count, k * head_count, v * head_count, z * head_count]

However NVFP4 bypasses modify_tensors() and has its own repacking, and linear_attn.in_proj_a.input_scale was seen by as a [num_v_heads] tensor and tried to reshape it into [16, 3, 1].
This is fixed by skipping tensors in the write loop that already were repacked

if self._is_nvfp4:
                if name.endswith(".weight") and name.replace(".weight", ".weight_scale") in self.model_tensors:
                    continue
                if name.endswith((".weight_scale", ".weight_scale_2", ".input_scale", "k_scale", ".v_scale"))
                    continue
 Updated: added k_scale and v_scale above

and by applying the same reordering for :

linear_attn.in_proj_qkv
linear_attn.in_proj_z
linear_attn.in_proj_a
linear_attn.in_proj_b
linear_attn.out_proj

This will now produce the correct Qwen3.5/Qwen3.5MoE NVFP4 GGUF file. A separate PR must be applied to load these files.
This fixed the issue with both Qwen3.5-122B-A10B-NVFP4 and Qwen3.5-27B-NVFP4 and correctly produced proper output.
Qwen3.5-35B-A3B-NVFP4.gguf was also tested after returning k_scale and v_scale to the skip list.

Note, some Qwen3.5 NVFP4 HF models produce this tokenizer error and others don't for the same model:

ValueError: Tokenizer class TokenizersBackend does not exist or is not currently imported.

Workaround:
Edit the model's tokenizer_config.json and change tokenizer_class from TokenizersBackend to Qwen2Tokenizer

loci-review · 2026-03-18T03:55:28Z

No meaningful performance changes were detected across 120755 analyzed functions in the following binaries: build.bin.llama-tts, build.bin.libllama.so, build.bin.llama-cvector-generator, build.bin.libmtmd.so, build.bin.llama-bench, build.bin.libggml-base.so, build.bin.libggml-cpu.so, build.bin.libggml.so, build.bin.llama-quantize, build.bin.llama-qwen2vl-cli, build.bin.llama-tokenize, build.bin.llama-gemma3-cli, build.bin.llama-gguf-split, build.bin.llama-llava-cli, build.bin.llama-minicpmv-cli.

🔎 Full breakdown: Loci Inspector
💬 Questions? Tag @loci-dev

michaelw9999 added 2 commits March 17, 2026 16:21

convert : fix Qwen3.5 NVFP4 conversion

c5c0813

Updated copilot concerns and rebased

3dcc1d9

loci-dev temporarily deployed to PROD__AL_DEMO March 18, 2026 02:17 — with GitHub Actions Inactive

loci-dev force-pushed the main branch 12 times, most recently from 8c39ead to 418d9f2 Compare March 26, 2026 02:17

loci-dev force-pushed the main branch 11 times, most recently from 1497621 to a67a372 Compare April 3, 2026 02:17

loci-dev force-pushed the main branch 3 times, most recently from 3655621 to fd3ce9d Compare April 6, 2026 02:18

loci-dev force-pushed the main branch 7 times, most recently from 55afbee to ef0eff4 Compare April 12, 2026 02:18

loci-dev force-pushed the main branch 7 times, most recently from 245e873 to d101579 Compare April 17, 2026 02:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

UPSTREAM PR #20505: convert: Fix Qwen3.5/Qwen3.5 Moe NVFP4 Conversions#1267

UPSTREAM PR #20505: convert: Fix Qwen3.5/Qwen3.5 Moe NVFP4 Conversions#1267
loci-dev wants to merge 2 commits intomainfrom
loci/pr-20505-nvfp4-fix-qwen-conversions

loci-dev commented Mar 18, 2026

Uh oh!

loci-review bot commented Mar 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

loci-dev commented Mar 18, 2026

Uh oh!

loci-review bot commented Mar 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants